aCaMEL NextFlow DSL2 pipeline for designing GTSeq assays from genotyped SNP data
This report has been generated by the aCaMEL/gtseqdesign analysis pipeline.
Report
generated on 2025-06-09, 20:38 UTC
based on data in:
/Users/tyler/projects/COL/gtseq_final/work/f1/e8022337ed2027c014595188f9bd3d
Panel validation
Analysis carried out to explore genetic structure of populations before and after panel generation.
Best K by cross-validation
Cross-validation (CV) plot for ADMIXTURE analysis run on the full filtered dataset
ADMIXTURE Assignments for Full Dataset
Stacked barplot showing ancestry proportions for each individual before SNP panel selection.
ADMIXTURE Assignments for Selected Panel
Stacked barplot showing ancestry proportions for each individual after SNP panel selection.
Regression of Assignment Probabilities
Comparison of minimum and maximum assignment probabilities pre vs post filtering.
Entropy of Assignment Probabilities
Per-sample entropy in assignment probabilities before and after filtering.
Summary of Filtering Effects on Admixture
Metrics summarizing the effect of filtering on ADMIXTURE-based assignment.
| Sample Name | R² (Max Assignment) | Slope (Max Assignment) | R² (Min Assignment) | Slope (Min Assignment) | Spearman Correlation (Entropy) | Mean Entropy (Pre) | Mean Entropy (Post) | Delta Entropy | % Admixed (Pre) | % Admixed (Post) | Delta % Admixed |
|---|---|---|---|---|---|---|---|---|---|---|---|
| summary | 0.9 | 1.1 | 0.2 | 0.7 | 1.0 | 0.4 | 0.5 | 0.1 | 0.4 | 0.4 | 0.1 |
Dataset Composition
Summary statistics and plots of dataset composition before panel design.
Summary of Filtering Steps
Sankey diagram displaying locus loss at each sampling step performed by SNPio
Per-sample Summary Statistics
Heterozygosity and missing data percentage pre- and post- filtering
| Sample Name | Missing_Pre | Missing_Post | Heterozygosity_Pre | Heterozygosity_Post |
|---|---|---|---|---|
| 28BACG01 | 0.8 | 0.0 | 0.3 | 0.3 |
| 28BACG02 | 0.8 | 0.2 | 0.2 | 0.3 |
| 28BACG03 | 0.7 | 0.0 | 0.2 | 0.3 |
| 28BACG04 | 0.8 | 0.1 | 0.2 | 0.3 |
| 28BATE02 | 0.9 | 0.6 | 0.3 | 0.3 |
| 28BATE06 | 0.8 | 0.2 | 0.3 | 0.3 |
| 28BATE16 | 0.3 | 0.8 | 0.2 | 0.3 |
| 28BNP01 | 0.7 | 0.0 | 0.2 | 0.3 |
| 28BNP03 | 0.8 | 0.1 | 0.2 | 0.2 |
| 28CRDE01 | 0.7 | 0.0 | 0.2 | 0.3 |
| 28CRLC03 | 0.9 | 0.2 | 0.2 | 0.2 |
| 28CRLC04 | 0.9 | 0.5 | 0.2 | 0.2 |
| 28CRLC08 | 0.9 | 0.6 | 0.2 | 0.2 |
| 28CRLC12 | 0.9 | 0.7 | 0.3 | 0.4 |
| 28CRLC13 | 0.9 | 0.7 | 0.3 | 0.4 |
| 28CRLC14 | 0.8 | 0.1 | 0.2 | 0.2 |
| 28CRLS02 | 0.8 | 0.1 | 0.1 | 0.2 |
| 28CRLS03 | 0.8 | 0.1 | 0.2 | 0.2 |
| 28CRLS04 | 0.8 | 0.1 | 0.2 | 0.3 |
| 28CRLS06 | 0.9 | 0.7 | 0.1 | 0.1 |
| 28CRLS11 | 0.6 | 0.0 | 0.2 | 0.3 |
| 28CRLS12 | 0.9 | 0.7 | 0.2 | 0.2 |
| 28IZCP01 | 0.8 | 0.0 | 0.2 | 0.3 |
| 28IZCP02 | 0.8 | 0.1 | 0.2 | 0.2 |
| 28IZCP03 | 0.8 | 0.1 | 0.2 | 0.3 |
| 28IZCP04 | 0.8 | 0.1 | 0.2 | 0.2 |
| 28IZCP05 | 0.8 | 0.1 | 0.2 | 0.2 |
| 28IZCP06 | 0.9 | 0.7 | 0.3 | 0.4 |
| 28IZCP07 | 0.9 | 0.6 | 0.2 | 0.2 |
| 28LOSL01 | 0.7 | 0.0 | 0.2 | 0.2 |
| 28LOSL02 | 0.7 | 0.0 | 0.1 | 0.2 |
| 28LOSL03 | 0.7 | 0.0 | 0.2 | 0.2 |
| 28LOSL04 | 0.7 | 0.1 | 0.2 | 0.2 |
| 28LOSL05 | 0.7 | 0.0 | 0.2 | 0.2 |
| 28LOSU01 | 0.8 | 0.1 | 0.2 | 0.3 |
| 28LOSU03 | 0.7 | 0.0 | 0.2 | 0.2 |
| 28LOSU04 | 0.7 | 0.0 | 0.2 | 0.2 |
| 28LOSU05 | 0.7 | 0.0 | 0.1 | 0.1 |
| 28LOSU16 | 0.8 | 0.1 | 0.2 | 0.2 |
| 28LOSU33 | 0.7 | 0.1 | 0.1 | 0.2 |
| 28MRFR02 | 0.8 | 0.1 | 0.3 | 0.4 |
| 28MRFR05 | 0.9 | 0.2 | 0.3 | 0.3 |
| 28MRFR06 | 0.7 | 0.0 | 0.2 | 0.3 |
| 28MRFR09 | 0.7 | 0.0 | 0.2 | 0.4 |
| 28MRFR11 | 0.9 | 0.3 | 0.2 | 0.3 |
| 28MRRU01 | 0.8 | 0.1 | 0.3 | 0.4 |
| 28MRRU02 | 0.8 | 0.1 | 0.3 | 0.4 |
| 28MRRU03 | 0.8 | 0.1 | 0.3 | 0.4 |
| 28MRRU04 | 0.8 | 0.1 | 0.3 | 0.3 |
| 28MRRU05 | 0.8 | 0.0 | 0.3 | 0.4 |
| 28NWPR01 | 0.8 | 0.2 | 0.2 | 0.3 |
| 28NWPR03 | 0.9 | 0.2 | 0.1 | 0.2 |
| 28NWPR04 | 0.9 | 0.2 | 0.3 | 0.3 |
| 28NWPR05 | 0.9 | 0.2 | 0.2 | 0.3 |
| 28NWPR06 | 0.9 | 0.3 | 0.2 | 0.2 |
| 28NWPR07 | 0.9 | 0.6 | 0.3 | 0.3 |
| 28NWPR08 | 0.9 | 0.3 | 0.3 | 0.3 |
| 28NWPR09 | 0.9 | 0.4 | 0.2 | 0.3 |
| 28NWPR31 | 0.6 | 0.0 | 0.2 | 0.3 |
| 28STCC01 | 0.9 | 0.4 | 0.3 | 0.3 |
| 28STCC02 | 0.9 | 0.3 | 0.3 | 0.3 |
| 28STCC03 | 0.9 | 0.3 | 0.2 | 0.2 |
| 28STCC04 | 0.9 | 0.3 | 0.3 | 0.3 |
| 28STCC05 | 0.9 | 0.2 | 0.3 | 0.3 |
| 28STCC10 | 0.9 | 0.4 | 0.2 | 0.3 |
| 28STCC11 | 0.8 | 0.2 | 0.2 | 0.3 |
| 28STCC51 | 0.3 | 0.0 | 0.3 | 0.3 |
| 28STCC68 | 0.8 | 0.1 | 0.2 | 0.2 |
| 28STCE01 | 0.9 | 0.4 | 0.3 | 0.3 |
| 28STCE02 | 0.9 | 0.3 | 0.3 | 0.3 |
| 28STCE03 | 0.9 | 0.6 | 0.3 | 0.4 |
| 28STCE05 | 0.9 | 0.6 | 0.4 | 0.5 |
| 28STCE07 | 0.9 | 0.2 | 0.4 | 0.5 |
| 28STCE08 | 0.9 | 0.5 | 0.3 | 0.4 |
| 28STCE09 | 0.9 | 0.6 | 0.3 | 0.3 |
| 28STCE53 | 0.3 | 0.0 | 0.3 | 0.3 |
| 28STCE59 | 0.8 | 0.2 | 0.3 | 0.4 |
| 28STHC01 | 0.6 | 0.0 | 0.3 | 0.4 |
| 28STHC03 | 0.7 | 0.0 | 0.3 | 0.4 |
| 28STHC05 | 0.8 | 0.1 | 0.3 | 0.3 |
| 28STHC09 | 0.9 | 0.3 | 0.2 | 0.3 |
| 28STHC15 | 0.6 | 0.0 | 0.2 | 0.3 |
| 28STHC16 | 0.8 | 0.2 | 0.3 | 0.4 |
| 28STHC17 | 0.6 | 0.0 | 0.3 | 0.4 |
| 28STOP01 | 0.9 | 0.2 | 0.3 | 0.4 |
| 28STOP02 | 0.8 | 0.0 | 0.2 | 0.3 |
| 28STOP03 | 0.8 | 0.1 | 0.2 | 0.3 |
| 28STOP04 | 0.9 | 0.3 | 0.3 | 0.3 |
| 28STOP05 | 0.8 | 0.1 | 0.2 | 0.3 |
| 28STOP06 | 0.8 | 0.1 | 0.2 | 0.3 |
| 28STOP07 | 0.9 | 0.4 | 0.2 | 0.3 |
| 28STOP08 | 0.8 | 0.1 | 0.2 | 0.2 |
| 28STOP09 | 0.8 | 0.1 | 0.2 | 0.2 |
| 28STOP11 | 0.7 | 0.1 | 0.1 | 0.2 |
| 28STOP12 | 0.5 | 0.0 | 0.2 | 0.2 |
| 28STOP14 | 0.8 | 0.1 | 0.2 | 0.2 |
| 28STOP15 | 0.6 | 0.0 | 0.2 | 0.2 |
| 28STOP17 | 0.8 | 0.1 | 0.2 | 0.2 |
| 28STOP19 | 0.4 | 0.0 | 0.2 | 0.3 |
Software Versions
Software Versions lists versions of software tools extracted from file contents.
| Group | Software | Version |
|---|---|---|
| ADMIXTUREPIPELINE | AdmixPipe | 3.2 |
| Admixture | 1.3 | |
| PLINK | 20220402 | |
| VCFtools | 0.1.16 | |
| BCFTOOLS_QUERY_PRE | bcftools | 1.21 |
| BESTK | awk | null |
| CLUMPAK | AdmixPipe | 3.2 |
| CLUMPAK | 1.1 | |
| COMPARE_ADMIXTURE | plotly | 6.1.2 |
| CVSUM | AdmixPipe | 3.2 |
| FILTER_POSITIONS | bcftools | 1.21 |
| INFER_POPULATIONS | awk | null |
| INFOCALC | infocalc | 1.1 |
| perl | 5.32.1 | |
| LIST_CHROMS | tabix | 1.21 |
| PLOT_CV | plotly | 6.1.2 |
| SAMPLE_SUMMARY | pandas | 2.2.3 |
| SNPIO_CONVERT_STRUCTURE | SNPio | 1.3.6 |
| SNPIO_FILTER | SNPio | 1.3.6 |
| TABIX_BGZIP | tabix | 1.21 |
| Workflow | Nextflow | 24.4.2 |
| aCaMEL/gtseqdesign | 1.0.dev0 |
aCaMEL/gtseqdesign Methods Description
Suggested text and references to use when describing pipeline usage within the methods section of a publication.
Methods
Data was processed using aCaMEL/gtseqdesign v1.0dev of the nf-core collection of workflows (Ewels et al., 2020), utilising reproducible software environments from the Bioconda (Grüning et al., 2018) and Biocontainers (da Veiga Leprevost et al., 2017) projects.
The pipeline was executed with Nextflow v24.04.2 (Di Tommaso et al., 2017) with the following command:
nextflow run ../../gtseqdesign/main.nf -profile docker,arm --input ../gtseq_trials/COL.filter.nremover.vcf --reference ../gtseq_trials/COL.loci --popmap ../gtseq_trials/COL.popmap --fully_contained --primer_length 75 --ranking_metric I_n --max_candidates 500 --min_maf 0.2 --maxk 10 --outdir results_noind -resume -c custom.config --ind_cov 0.95 --snp_cov 0.5
References
- Di Tommaso, P., Chatzou, M., Floden, E. W., Barja, P. P., Palumbo, E., & Notredame, C. (2017). Nextflow enables reproducible computational workflows. Nature Biotechnology, 35(4), 316-319. doi: 10.1038/nbt.3820
- Ewels, P. A., Peltzer, A., Fillinger, S., Patel, H., Alneberg, J., Wilm, A., Garcia, M. U., Di Tommaso, P., & Nahnsen, S. (2020). The nf-core framework for community-curated bioinformatics pipelines. Nature Biotechnology, 38(3), 276-278. doi: 10.1038/s41587-020-0439-x
- Grüning, B., Dale, R., Sjödin, A., Chapman, B. A., Rowe, J., Tomkins-Tinch, C. H., Valieris, R., Köster, J., & Bioconda Team. (2018). Bioconda: sustainable and comprehensive software distribution for the life sciences. Nature Methods, 15(7), 475–476. doi: 10.1038/s41592-018-0046-7
- da Veiga Leprevost, F., Grüning, B. A., Alves Aflitos, S., Röst, H. L., Uszkoreit, J., Barsnes, H., Vaudel, M., Moreno, P., Gatto, L., Weber, J., Bai, M., Jimenez, R. C., Sachsenberg, T., Pfeuffer, J., Vera Alvarez, R., Griss, J., Nesvizhskii, A. I., & Perez-Riverol, Y. (2017). BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics (Oxford, England), 33(16), 2580–2582. doi: 10.1093/bioinformatics/btx192
Notes:
- If available, make sure to update the text to include the Zenodo DOI of version of the pipeline used.
- The command above does not include parameters contained in any configs or profiles that may have been used. Ensure the config file is also uploaded with your publication!
- You should also cite all software used within this run. Check the "Software Versions" of this report to get version information.
aCaMEL/gtseqdesign Workflow Summary
- this information is collected when the pipeline is started.
Core Nextflow options
- runName
- romantic_payne
- containerEngine
- docker
- launchDir
- /Users/tyler/projects/COL/gtseq_final
- workDir
- /Users/tyler/projects/COL/gtseq_final/work
- projectDir
- /Users/tyler/projects/gtseqdesign
- userName
- tyler
- profile
- docker,arm
- configFiles
- N/A
Input/output options
- input
- ../gtseq_trials/COL.filter.nremover.vcf
- outdir
- results_noind
- reference
- ../gtseq_trials/COL.loci
- popmap
- ../gtseq_trials/COL.popmap
Runtime parameters
- ind_cov
- 0.95
- fully_contained
- true
- min_maf
- 0.2
- ranking_metric
- I_n
Max job request options
- max_cpus
- 8
- max_memory
- 16.GB